Computing transitive closures

ABSTRACT

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for computing transitive closures of relations. One of the methods includes initializing the transitive closure F of an initial iteration with the tuples in another relation f. New first tuples and new second tuples are iteratively computed until no new first tuples are generated, including: generating new first tuples on each iteration by matching destination elements of tuples in the auxiliary relation of a previous iteration with source elements of tuples in the auxiliary relation of the previous iteration, generating second tuples on each iteration by matching destination elements of the new first tuples with source elements of tuples in F of the previous iteration, and adding the new first tuples and the new second tuples to F of a current iteration.

BACKGROUND

This specification relates to data processing, in particular, tocalculating the transitive closure of a relation.

A relation is a set of tuples (t₁, . . . , T_(n)), each tuple having n≧1data elements t_(i). Each element t₁ is a corresponding value, which mayrepresent a value of a corresponding attribute. The attribute willgenerally have an attribute name. The correspondence between attributeand value is determined by the position of the value in the tuple, i.e.,each attribute has a corresponding position. Relations are commonlythought of as, represented as, and referred to as tables in which eachrow is a tuple and each column is an attribute.

A binary relation is a relation whose tuples have two data elementseach, i.e., they are pairs. Given a binary relation f, and a tuple (a,b)in that relation, the elements a and b are related by f. For a giventuple (a,b), a is referred to as the source element, and b is referredto as the destination element.

Two elements a and b are transitively related by f when a sequence oftuples in f can be found such that the first tuple in the sequencecontains a as a source element, the last tuple in the sequence containsb as a destination element, and every tuple in the sequence has adestination element that matches a source element of a subsequent tuplein the sequence, except for the last tuple in the sequence, which has nosubsequent tuple). From a binary relation f, another relation F,referred to as the transitive closure off, can be generated containingall pairs of elements that are in for are transitively related by f.

Tuples in a relation f can represent edges between nodes in a directedgraph. FIG. 1 illustrates an example directed graph. The graph includesnodes v1 through v7 and directed edges that connect the nodes. The edgesin the graph can be represented by a relation f having the followingtuples:

-   -   (v1,v2)    -   (v2,v3)    -   (v3,v4)    -   (v4,v5)    -   (v1,v6)    -   (v6,v7)    -   (v7,v6)    -   (v7,v5)

In this example, the source element of each tuple represents a sourcenode of an edge, and the destination element in the tuple represents thedestination node of the edge.

In the present example, v1 and v5 are transitively related because thereis a sequence of tuples in f such that the first tuple has v1 as itssource element, the last tuple has v4 as its destination element, andevery tuple in the sequence has destination element that matches asource element of a subsequent tuple in the sequence, e.g., the sequence(v1,v2), (v2,v3), (v3,v4), (v4,v5). Nodes v7 and v4 are not transitivelyrelated because there is no such sequence of tuples in f. Thus (v1,v5)is a member of F, and (v7,v4) is not a member of F. In the graphcontext, two nodes s and d are transitively related if there are edgesin the graph such that d is reachable from s. Nodes v1 and v5 aretransitively related because node v5 is reachable from node v1.Furthermore, v7 and v4 are not transitively related because node v4 isnot reachable from node v7.

In a sequence of tuples representing transitively related elements inoff, e.g., (v1,v2), (v2,v3), (v3,v4), (v4,v5), the sequence can bedescribed as n chained uses off, or equivalently, n chained uses of afunction f(a,b) that operates on the relation f and returns true if(a,b) is in f, where n represents the number of tuples in the sequence.Thus, when the tuples represent edges in a graph, n chained uses of frepresents paths in the graph having a length of n steps.

In this specification, a sequence of tuples having a length n, or nchained uses of a relation f, may be referred to as a path having alength n even when the relation f does not represent a graph. In otherwords, use of the term path does not necessarily require a graph.Rather, a path existing from s to d merely indicates that s and d aretransitively related.

Computing transitive closures can be expensive in both processing timeand storage space required. This is due in part to the multiple pathsbeing explored that result in duplicate tuples. For example, a computersystem computing the transitive closure off will explore two paths fromv1 to v5. The first path goes through v2, and the second path goesthrough v6. Exploring both of these paths can result in generating(v1,v5) twice. Furthermore, the loop between v6 and v7 means that thereare infinitely many paths between v1 and v5, e.g. (v1,v6), (v6,v7),(v7,v6), (v6, v7), (v7, v5) as well as (v1,v6), (v6,v7), (v7,v6), (v6,v7), (v7,v6), (v6, v7), (v7,v6), (v6, v7), (v7, v5).

Furthermore, a system can also end up generating many duplicate tuplesdue to exploring even single paths in multiple ways. For example, asystem can generate the transitive relation (v1,v5) by joining multipledifferent subpaths, including joining (v1,v2), (v2,v5); (v1,v3),(v3,v5); and (v1,v4), (v4,v5).

The time required to compute the transitive closure of a relationdepends significantly on the longest shortest path in f. The transitiveclosure F will include tuples representing all possible paths betweenelements in the relation f. Therefore, a computer system will need toexplore at least the shortest path between each pair of transitivelyrelated elements in f. The longest of these paths, i.e. the longestshortest path, is therefore a limiting factor in how quickly thetransitive closure can be computed.

Transitive closures can be computed by a computer system usingpredicates. In some programming languages, a predicate is a functiondefined by one or more statements that maps one or more input parametersto true or false. A predicate operates on an associated relation andreturns true or false depending on whether a tuple defined by the inputparameters occurs in the associated relation. In other words, f(a,b) istrue if the tuple (a,b) is in the relation f and false otherwise. Forbrevity the name of the predicate, e.g., f, may refer to the predicatefunction itself or to the associated relation, the meaning of which willbe clear from the context.

Thus, a predicate F(s,d) can operate on an associated relation that isthe transitive closure of another relation associated with a predicatef(s,d). Thus, the predicate F(s,d) returns true if a tuple (s,d) existsin the associated relation for F and returns false otherwise.

The general manner in which evaluation engines compute associatedrelations from recursively defined statements will now be described.Recursive statements are statements that reference their own output. Anexample query language that supports recursive statements is Datalog.The following example statement can be written in Datalog:

-   -   f(i):—i=1; (i=2, f(1)); (i=3, f(2))

This statement in Datalog recursively defines a predicate f having inputparameter i, which can be expressed as f(i). The:—operator defines thepredicate f(i) to have the body “i=1; (i=2, f(1)); (i=3, f(2))”. Asemicolon represents disjunction, i.e., logical “or,” and a commarepresents conjunction, i.e., logical “and.” For clarity, logical “and”will occasionally be spelled out as an explicit “and” operator.

The semantics of the predicate f(i) in Datalog is that the body of f(i)is evaluated to compute the associated relation for f(i). The relationis the smallest set of values i such that the body of f(i) is satisfied,i.e., evaluates to true. Then, when a value for i is provided as inputto the predicate f(i), the predicate evaluates to true if the valueoccurs in the associated relation for f(i) and false otherwise. Forexample, f(1) evaluates to “true” because the term “i=1” in the bodydefines the associated relation to include the tuple (1). Therefore,because the associated relation includes (1), f(1) evaluates to “true.”Evaluation of predicates is typically performed by an evaluation enginefor the query language implemented by software installed on one or morecomputers.

The relation over which f(i) is evaluated may be specified within thebody of the predicate. In this example, the body of f(i) defines arelation having a set of singleton tuples, e.g., {1, 2, 3}. However, therelation over which f(i) is evaluated may alternatively be specified byanother predicate or may be explicitly defined. For example, therelation may be defined by a table in a database.

Evaluating a recursive predicate is to compute the least fixed point ofthe predicate. The least fixed point is a relation having a set oftuples that is a subset of all other fixed points of the predicate.Evaluation engines that evaluate recursive predicates can use a numberof different procedures for finding the least fixed point.

Some methods for finding the least fixed point of a recursive predicaterecast the predicate into a number of nonrecursive evaluationpredicates. The evaluation predicates are then evaluated in sequenceuntil a least fixed point is reached. In general, recasting a recursivepredicate into a number of nonrecursive evaluation predicates may bereferred to as “flattening” the recursion.

An evaluation engine for a particular query language can recast arecursive predicate as follows. A first nonrecursive predicate isdefined as false. In addition to false, a sequence of subsequentnonrecursive predicates are defined according to the body of therecursive predicate. In doing so, the evaluation engine replaces eachrecursive term with a reference to a previous nonrecursive predicate.Logically, the number of nonrecursive predicates that can be generatedis unbounded. However, the evaluation engine will halt evaluation whenthe least fixed point is reached.

The evaluation engine then evaluates the nonrecursive predicates inorder and adds resulting tuples to the associated relation for thepredicate. The evaluation engine stops when a nonrecursive predicate isreached whose evaluation adds no additional tuples to the relation. Thefinal result is the associated relation for the recursively definedpredicate.

Using this procedure, evaluating each successive predicate regeneratesall of the results that have already been generated. Thus, this approachis sometimes referred to as “naive evaluation.”

For simplicity, predicates in this specification will generally berepresented logically and may not necessarily have the form of alanguage construct of any particular query language. However, theimplementation of the illustrated logical predicates by an evaluationengine is normally straightforward for query languages that supportrecursive predicates.

Thus, to illustrate naive evaluation, an evaluation engine can recastthe predicate above into the following nonrecursive evaluationpredicates.

-   -   f₀(i):—false    -   f₁(i):—i=1; (i=2, f₀(1)); (i=3, f₀(2))    -   f₂(i):—i=1; (i=2, f₁(1)); (i=3, f₁(2))    -   f₃(i):—i=1; (i=2, f₂(1)); (i=3, f₂(2))    -   . . .

Or, for brevity, the evaluation predicates may be represented as:

-   -   f₀(i):—false        -   f_(n+1)(i):—i=1; (i=2, f_(n)(1)); (i=3, f_(n)(2))

At first glance, this notation may look like a recursive definition, butit is not. This is because the subscripts of the predicates denotedifferent nonrecursive predicates occurring in the potentially unboundedsequence of predicates. In other words, the predicate f_(n+1) is notrecursive because it references f_(n), but not itself. The evaluationengine then evaluates the nonrecursive predicates in order to find theleast fixed point.

An evaluation engine can use naive evaluation to compute a relationrepresenting the transitive closure off.

FIG. 2 illustrates another example graph. The edges of the graph in FIG.2 can be represented by a relation f having the following tuples:

-   -   (v1,v2)    -   (v2,v3)    -   (v3,v4)    -   (v4,v5)    -   (v5,v6)    -   (v6,v7)    -   (v7,v8)

Evaluating the following recursive predicate will compute the transitiveclosure off:

-   -   F(s,d):—f(s,d); exists(a: f(s,a), F(a,d))

The term “exists(a: f(s,a), F(a, d))” has an existential quantifier. Aterm having an existential quantifier may be referred to as anexistential term. This existential term asserts that there is a dataelement a such that (s,a) is a tuple in f and that (a,d) is a tuple inthe transitive closure F. Intuitively, this definition asserts that if(a,d) is in F and (s,a) is in f, then (s,d) is also in F.

This notation can also be thought of as generating a new tuple (s,d) bytaking a first tuple (a,d) in F, where (a,d) represents a reachable pathin the graph, and extending the reachable path one more step with (s,a)in f. Or equivalently, this notation can be thought of as extending aone-step path represented by (s,a) in f with another path having one ormore steps, represented by (a,d) in F.

An evaluation engine can use naive evaluation to compute the transitiveclosure F(s,d). To do so, the evaluation engine can generate thefollowing non-recursive evaluation predicates to flatten the recursivedefinition of F(s,d):

-   -   F₀(s,d):—false    -   F_(n+1)(s,d):—f(s,d); exists(a: f(s,a), F_(n)(a,d))

Naive evaluation proceeds as illustrated in TABLE 1.

TABLE 1 Previous Current Predicate relation relation Comments F₁(s,d) {} {(v1, v2), The relation of F₀(s,d) is empty. (v2, v3), Thus, F₁(s,d)evaluates to: (v3, v4), f(s,d); exists(a: f(s,a), F₀(a,d)) (v4, v5), or(v5, v6), (s,d) is in f (v6, v7), OR (v7, v8)} there exists an a suchthat (s, a) is in f and (a,d) is in { } Thus, only the tuples of f aregenerated. F₂(s,d) {(v1, v2), {(v1, v2), F₂(s,d) evaluates to: (v2, v3),(v2, v3), f(s,d); exists(a: f(s,a), F₁(a,d)) (v3, v4), (v3, v4), or (v4,v5), (v4, v5), (s,d) is in f (v5, v6), (v5, v6), OR (v7, v8)} (v7, v8),there exists an a such that (s, a) is in (v1, v3), f and (a,d) is in F₁.(v2, v4), At this point, F₁ includes f itself, so (v3, v5), the tuplesproduced are the tuples in (v4, v6), f as well as the tuples in fextended (v5, v7), by one step in the graph. Thus, the (v6, v8)} tuplesproduced represent two steps in the graph. Thus, {(v1, v3), (v2, v4),(v3, v5), (v4, v6), (v5, v7), (v6, v8)} are newly generated. F₃(i) {(v1,v2), {(v1, v2), F₃(s,d) evaluates to: (v2, v3), (v2, v3), f(s,d);exists(a: f(s,a), F₂(a,d) (v3, v4), (v3, v4), or (v4, v5), (v4, v5),(s,d) is in f (v5, v6), (v5, v6), OR (v7, v8), (v7, v8), there exists ana such that (s, a) is in (v1, v3), (v1, v3), f and (a,d) is in F₂. (v2,v4), (v2, v4), The tuples produced are those in f (v3, v5), (v3, v5),and those reachable from the (v4, v6), (v4, v6), destinations in F₂,which represented (v5, v7), (v5, v7), up to two steps in the graph.Thus, (v6, v8)} (v6, v8), the new tuples are those that (v1, v4),represent three steps in the graph. (v2, v5), Thus, {(v1, v4), (v2, v5),(v3, v6), (v3, v6), (v4, v7), (v5, v8)} are newly (v4, v7), generated.(v5, v8)} F₄(i) {(v1, v2), {(v1, v2), F₄(s,d) evaluates to: (v2, v3),(v2, v3), f(s,d); exists(a: f(s,a), F₃(a,d) (v3, v4), (v3, v4), or (v4,v5), (v4, v5), (s,d) is in f (v5, v6), (v5, v6), OR (v7, v8), (v7, v8),there exists an a such that (s, a) is in (v1, v3), (v1, v3), f and (a,d)is in F₃. (v2, v4), (v2, v4), The tuples produced are those in f (v3,v5), (v3, v5), and those reachable from the (v4, v6), (v4, v6),destinations in F₃, which represented (v5, v7), (v5, v7), up to threesteps in the graph. Thus, (v6, v8), (v6, v8), the new tuples are thosethat (v1, v4), (v1, v4), represent four steps in the graph. (v2, v5),(v2, v5), Thus, {(v1, v5), (v2, v6), (v3, v7), (v3, v6), (v3, v6), (v4,v8)} are newly generated. (v4, v7), (v4, v7), (v5, v8)} (v5, v8), (v1,v5), (v2, v6), (v3, v7), (v4, v8)} F₅(i) {(v1, v2), {(v1, v2), F₅(s,d)evaluates to: (v2, v3), (v2, v3), f(s,d); exists(a: f(s,a), F₄(a,d) (v3,v4), (v3, v4), or (v4, v5), (v4, v5), (s,d) is in f (v5, v6), (v5, v6),OR (v7, v8), (v7, v8), there exists an a such that (s, a) is in (v1,v3), (v1, v3), f and (a,d) is in F₄. (v2, v4), (v2, v4), The tuplesproduced are those in f (v3, v5), (v3, v5), and those reachable from the(v4, v6), (v4, v6), destinations in F₄, which represented (v5, v7), (v5,v7), up to four steps in the graph. Thus, (v6, v8), (v6, v8), the newtuples are those that (v1, v4), (v1, v4), represent five steps in thegraph. (v2, v5), (v2, v5), Thus, {(v1, v6), (v2, v7), (v3, v8)} (v3,v6), (v3, v6), are newly generated. (v4, v7), (v4, v7), (v5, v8), (v5,v8), (v1, v5), (v1, v5), (v2, v6), (v2, v6), (v3, v7), (v3, v7), (v4,v8)} (v4, v8), (v1, v6), (v2, v7), (v3, v8)} F₆(i) {(v1, v2), {(v1, v2),F₆(s,d) evaluates to: (v2, v3), (v2, v3), f(s,d); exists(a: f(s,a),F₅(a,d) (v3, v4), (v3, v4), or (v4, v5), (v4, v5), (s,d) is in f (v5,v6), (v5, v6), OR (v7, v8), (v7, v8), there exists an a such that (s, a)is in (v1, v3), (v1, v3), f and (a,d) is in F₅. (v2, v4), (v2, v4), Thetuples produced are those in f (v3, v5), (v3, v5), and those reachablefrom the (v4, v6), (v4, v6), destinations in F₅, which represented (v5,v7), (v5, v7), up to five steps in the graph. Thus, (v6, v8), (v6, v8),the new tuples are those that (v1, v4), (v1, v4), represent six steps inthe graph. (v2, v5), (v2, v5), Thus, {(v1, v7), (v2, v8)} are (v3, v6),(v3, v6), newly generated. (v4, v7), (v4, v7), (v5, v8), (v5, v8), (v1,v5), (v1, v5), (v2, v6), (v2, v6), (v3, v7), (v3, v7), (v4, v8), (v4,v8), (v1, v6), (v1, v6), (v2, v7), (v2, v7), (v3, v8), (v3, v8), (v4,v9)} (v4, v9), (v1, v7), (v2, v8)} F₇(i) {(v1, v2), {(v1, v2), F₇(s,d)evaluates to: (v2, v3), (v2, v3), f(s,d); exists(a: f(s,a), F₆(a,d) (v3,v4), (v3, v4), or (v4, v5), (v4, v5), (s,d) is in f (v5, v6), (v5, v6),OR (v7, v8), (v7, v8), there exists an a such that (s, a) is in (v1,v3), (v1, v3), f and (a,d) is in F₆. (v2, v4), (v2, v4), The tuplesproduced are those in f (v3, v5), (v3, v5), and those reachable from the(v4, v6), (v4, v6), destinations in F₆, which represented (v5, v7), (v5,v7), up to six steps in the graph. Thus, (v6, v8), (v6, v8), the newtuples are those that (v1, v4), (v1, v4), represent seven steps in thegraph. (v2, v5), (v2, v5), Thus, {(v1, v8)} is newly (v3, v6), (v3, v6),generated. (v4, v7), (v4, v7), (v5, v8), (v5, v8), (v1, v5), (v1, v5),(v2, v6), (v2, v6), (v3, v7), (v3, v7), (v4, v8), (v4, v8), (v1, v6),(v1, v6), (v2, v7), (v2, v7), (v3, v8), (v3, v8), (v4, v9), (v4, v9),(v1, v7), (v1, v7), (v2, v8)} (v2, v8), (v1, v8)} F₈(i) {(v1, v2), {(v1,v2), F₈(s,d) evaluates to: (v2, v3), (v2, v3), f(s,d); exists(a: f(s,a),F₇(a,d) (v3, v4), (v3, v4), or (v4, v5), (v4, v5), (s,d) is in f (v5,v6), (v5, v6), OR (v7, v8), (v7, v8), there exists an a such that (s, a)is in (v1, v3), (v1, v3), f and (a,d) is in F₇. (v2, v4), (v2, v4), F₇contains all tuples representing up (v3, v5), (v3, v5), to 7 steps inthe graph (v4, v6), (v4, v6), The tuples produced are those in f (v5,v7), (v5, v7), and those reachable from the (v6, v8), (v6, v8),destinations in F₇, which represented (v1, v4), (v1, v4), up to sevensteps in the graph. (v2, v5), (v2, v5), Because seven steps is thelongest (v3, v6), (v3, v6), path in the graph, there are no (v4, v7),(v4, v7), additional tuples generated. (v5, v8), (v5, v8), (v1, v5),(v1, v5), (v2, v6), (v2, v6), (v3, v7), (v3, v7), (v4, v8), (v4, v8),(v1, v6), (v1, v6), (v2, v7), (v2, v7), (v3, v8), (v3, v8), (v4, v9),(v4, v9), (v1, v7), (v1, v7), (v2, v8), (v2, v8) (v1, v8)} (v1, v8)}

When F₈ is evaluated, no additional tuples are added to the relation.Therefore, the evaluation engine can determine that the least fixedpoint has been reached. The tuples in F7 thus represent the transitiveclosure of f.

The evaluation engine required 8 iterations to compute the transitiveclosure. In general, when using this strategy an evaluation engine needsto compute lsp+1 iterations, where lsp represents the length of thelongest shortest path in f.

Using naive evaluation also generated many duplicate tuples. Inparticular evaluation of the recursive definition of F requiredregenerating all the tuples that had already been generated on everysingle iteration. Thus, even for mildly complicated data sets, naiveevaluation is very expensive.

Another prior art procedure for evaluating recursive predicates isreferred to as “semi-naive evaluation.” When using semi-naiveevaluation, an evaluation engine flattens the recursion of the predicatein a different way than naive evaluation. In particular, the evaluationengine defines a delta predicate whose associated relation is defined toinclude only the new tuples found on each iteration. The least fixedpoint is found when an iteration is reached in which the deltapredicate's associated relation is empty.

Evaluating the previous definition of F with semi-naive evaluation wouldavoid some of the unnecessary generation of duplicate tuples, but itwould still require the same number of iterations (lsp+1) to find thetransitive closure.

An alternative definition for computing the transitive closure off isgiven by the following recursively defined predicate:

-   -   F(s,d):—f(s,d); exists(a: F(s,a), F(a,d))

When using either naive or semi-naive evaluation, this definitionresults in fewer iterations than the definition illustrated in TABLE 1.

Intuitively, this definition asserts that if (a,d) is in F and (s,a) isin F, then (s,d) is also in F. In other words, this notation can bethought of as generating a new tuple (s,d) by extending a pathrepresented by a tuple (s,a) in F by a path represented by another tuple(a,d) that is also in F.

To illustrate using semi-naive evaluation to find the transitiveclosure, an evaluation engine can generate the following evaluationpredicates:

-   -   δ₀(s,d):—false    -   F₀(s,d):—false    -   δ_(n+1)(s,d):—(f(s,d); (exists(a: F_(n)(s,a), δ_(n)(a,d));        (exists(a: δ_(n)(s,a), F_(n)(a,d))), not F_(n)(s,d)    -   F_(n+1)(s,d):—F_(n)(s,d); δ_(n+1)(s,d)

As mentioned above, semi-naive evaluation uses an evaluation predicatethat is referred to as a delta predicate. A system can generate thedelta predicate by replacing recursive calls in the original predicatewith nonrecursive calls to the previous delta predicate; where a singledisjunct contains multiple recursive calls, as in this example, thedisjunct is duplicated once for each recursive call, as shown above. Thesystem then generates a conjunction of the result with a negation of thepredicate from the previous iteration. Thus, the delta predicate isdefined to include only new tuples found in a particular iteration ofthe evaluation. The term “not F_(n)(s,d)” at the end of the definitionfor δ_(n+1)(s,d) indicates that previously found tuples do not satisfythe delta predicate for δ_(n+1)(s,d).

Evaluation of the transitive closure using semi-naive evaluation isillustrated in TABLE 2. An evaluation engine need not compare a previousrelation to a current relation as was done for naive evaluation. Rather,the evaluation engine can halt when the first empty delta predicate isfound.

TABLE 2 Predicate Relation Comments δ₀(s,d) { } Empty by definitionF₀(s,d) { } Empty by definition δ₁(s,d) {(v1, v2), δ₁(s,d) evaluates to:(v2, v3), (f(s,d); (exists(a: F₀(s,a), δ₀(a,d)); (exists(a: δ₀(s,a),(v3, v4), F₀(a,d))), not F₀(s,d) (v4, v5), Because there are no tuplesin F₀ or δ₀, only the tuples in f are (v5, v6), generated. (v6, v7),(v7, v8)} F₁(s,d) {(v1, v2), F₁(s,d) evaluates to: (v2, v3), F₀(s,d);δ₁(s,d) (v3, v4), or (v4, v5), (s,d) is in { } OR (s,d) is in {(v1, v2),(v2, v3), (v3, (v5, v6), v4), (v4, v5), (v5, v6), (v6, v7), (v7, v8)}(v6, v7), Thus, the tuples in f are generated. (v7, v8)} δ₂(s,d) {(v1,v3), δ₂(s,d) evaluates to: (v2, v4), (f(s,d); (exists(a: F₁(s,a),δ₁(a,d)); (exists(a: δ₁(s,a), (v3, v5), F₁(a,d))), not F₁(s,d) (v4, v6),The tuples produced are those that are reachable from (v5, v7), a sourcein F₁ or a source in δ₁. (v6, v8)} Since F₁ and δ₁ are both equal to f,the tuples generated are those that represent two steps in the graph.F₂(s,d) {(v1, v2), F₂(s,d) evaluates to: (v2, v3), F₁(s,d); δ₂(s,d) (v3,v4), or (v4, v5), (s,d) is in {(v1, v2), (v2, v3), (v3, v4), (v4, v5),(v5, (v5, v6), v6), (v6, v7), (v7, v8)} OR (s,d) is in {(v1, v3), (v2,(v6, v7), v4), (v3, v5), (v4, v6), (v5, v7), (v6, v8)} (v7, v8), (v1,v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7), (v6, v8)} δ₃(s,d) {(v1,v4), δ₃(s,d) evaluates to: (v1, v5), (f(s,d); (exists(a: F₂(s,a),δ₂(a,d)); (exists(a: δ₂(s,a), (v2, v5), F₂(a,d))), not F₂(s,d) (v2, v6),The tuples produced are those that form a transitive (v3, v6), relationfrom a first relation in F₂ or δ₂ and a second (v3, v7), relation in F₂or δ₂. (v4, v7), Both F₂ and δ₂ include tuples that represent one step(v4, v8), and two steps in the graph, so the new tuples produced (v5,v8)} by combining those tuples represent 3 and 4 steps in the graph.Thus, the following tuples are generated: {(v1, v4), (v1, v5), (v2, v5),(v2, v6), (v3, v6), (v3, v7), (v4, v7), (v4, v8), (v5, v8)} F₃(s,d){(v1, v2), F₃(s,d) evaluates to: (v2, v3), F₂(s,d); δ₃(s,d) (v3, v4), or(v4, v5), (s,d) is in {(v1, v2), (v2, v3), (v3, v4), (v4, v5), (v5, (v5,v6), v6), (v6, 7), (v6, v7), (v7, v8), (v1, v3), (v2, v4), (v3, v5),(v4, v6), (v5, v7), (v7, v8), (v6, v8)} OR (s,d) is in {(v1, v4), (v1,v5), (v2, v5), (v1, v3), (v2, v6), (v3, v6), (v3, v7), (v4, v7), (v4,v8), (v5, v8)} (v2, v4), (v3, v5), (v4, v6), (v5, v7), (v6, v8), (v1,v4), (v1, v5), (v2, v5), (v2, v6), (v3, v6), (v3, v7), (v4, v7), (v4,v8), (v5, v8)} δ₄(s,d) {(v1, v6), δ₄(s,d) evaluates to: (v1, v7),(f(s,d); (exists(a: F₃(s,a), δ₃(a,d)); (exists(a: δ₃(s,a), (v1, v8),F₃(a,d))), not F₃(s,d) (v2, v7), The tuples produced are those that forma transitive (v2, v8), relation from a first tuple in F₃ and a secondtuple in δ₃ (v3, v8)} or from a first tuple in δ₃ and a second tuple inF₃. Both F₃ and δ₃ include tuples that represent one, two, three, orfour steps in the graph, so the new tuples produced by transitivelycombining those tuples represent up to 8 steps in the graph. Thus, thefollowing tuples are generated: {(v1, v6), (v1, v7), (v1, v8), (v2, v7),(v2, v8), (v3, v8)} The tuples formed are those in δ₃ that can beextended by those in F₃. F₃(s,d) {(v1, v2), F₄(s,d) evaluates to: (v2,v3), F₃(s,d); δ₄(s,d) (v3, v4), or (v4, v5), (s,d) is in {(v1, v2), (v2,v3), (v3, v4), (v4, v5), (v5, (v5, v6), v6), (v6, 7), (v6, v7), (v7,v8), (v1, v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7), (v7, v8), (v6,v8), (v1, v4), (v1, v5), (v2, v5), (v2, v6), (v3, v6), (v1, v3), (v3,v7), (v4, v7), (v4, v8), (v5, v8)} OR (s,d) is in (v2, v4), {(v1, v6),(v1, v7), (v1, v8), (v2, v7), (v2, v8), (v3, v8)} (v3, v5), (v4, v6),(v5, v7), (v6, v8), (v1, v4), (v1, v5), (v2, v5), (v2, v6), (v3, v6),(v3, v7), (v4, v7), (v4, v8), (v5, v8), (v1, v6), (v1, v7), (v1, v8),(v2, v7), (v2, v8), (v3, v8)}

Using this alternate definition of the transitive closure required onlythree iterations. In general, an evaluation engine using this techniqueto compute the transitive closure needs to compute O(log₂(lsp))iterations. Therefore, it is generally more efficient, in terms ofiterations required, than naive evaluation, the previous method.

However, this strategy still generates many duplicates tuples. Considerthe tuples generated when considering s=v1 during the iteration of F3.The tuples generated during this iteration are illustrated in TABLE 3:

TABLE 3 Disjunct Used Tuples Generated Generated From exists(a: F₃(s,a),δ₃(a,d)) (v1, v5) (v1, v2), (v2, v5) (v1, v6) (v1, v2), (v2, v6) (v1,v6) (v1, v3), (v3, v6) (v1, v7) (v1, v3), (v3, v7) (v1, v7) (v1, v4),(v4, v7) (v1, v8) (v1, v4), (v4, v8) (v1, v8) (v1, v5), (v5, v8)exists(a: δ₃(s,a), F₃(a,d)) (v1, v5) (v1, v4), (v4, v5) (v1, v6) (v1,v4), (v4, v6) (v1, v7) (v1, v4), (v4, v7) (v1, v8) (v1, v4), (v4, v8)(v1, v6) (v1, v5), (v5, v6) (v1, v7) (v1, v5), (v5, v7) (v1, v8) (v1,v5), (v5, v8)

On this iteration for v1, only four new tuples were generated: (v1,v5),(v1,v6), (v1,v7), and (v1,v8). However, the tuple (v1,v5) was generatedtwice, (v1,v6) was generated three times, (v1,v7) was generated threetimes, and (v1,v8) was generated three times.

In general, this approach generates a quadratic number of duplicatetuples, but only a linear number of new tuples, on each iteration.

SUMMARY

This specification describes technologies relating to computing thetransitive closure of a relation more efficiently in terms of iterationsrequired and duplicate tuples produced.

The subject matter described in this specification can be implemented inparticular embodiments so as to realize one or more of the followingadvantages. The transitive closure of a relation can be computed morequickly and using fewer computational resources. The transitive closurecan be computed in O(log₂(lsp)) time while also reducing or eliminatingduplicate tuples that are generated.

The details of one or more embodiments of the subject matter of thisspecification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawings,and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example directed graph.

FIG. 2 illustrates another example graph.

FIG. 3 is a flow chart of an example process for computing thetransitive closure of a relation.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

This specification describes improved ways of computing the transitiveclosure of a relation. This specification also describes mechanisms forevaluating a relation representing a transitive closure.

An example application for an evaluation engine that can compute atransitive closure of a relation occurs in static analysis systems.Static analysis refers to techniques for analyzing computer softwaresource code without executing the source code as a software program. Astatic analysis system can use a query language to determine a varietyof attributes about source code in a code base.

In static analysis systems, it is useful to represent control flows of acomputer program. For example, a software developer might want to knowif a function A defined in a code base ever calls another function B,either directly or indirectly. Similarly, a software developer alsomight want to know if a variable is always initialized before it isused, or whether a control flow exists in the software in which thevariable is used before it is initialized.

In very large code bases, determining an answer to these questions usingmanual inspection or simple text searching may be difficult orimpossible. In addition, actually running the software may not yield acorrect answer to the question if the testing procedures omit controlflows that nevertheless exist in the software.

A static analysis system can solve this problem by processing sourcecode in a code base to generate a relation f representing relationshipsbetween source code elements. For example, the relation can includetuples (p,c) where p represents a calling function and c represents acalled function. In large code bases, such a relation may includemillions of such tuples.

The static analysis system can then compute the transitive closure of f.After doing so, in order to determine whether a control flow exists inthe software in which the function B is called by the function A, thesystem merely needs to determine whether or not (A,B) exists in thetransitive closure off.

Transitive closures are useful in many other applications andindustries. For example, some digital authentication mechanisms, e.g.,Pretty Good Privacy (PGP), allow a first user to digitally sign a publickey of a second user. Doing so signifies that the first user trusts thesecond user. If a relation contains information about which users trustwhich other users, a transitive closure of the relation can representchains of trust among the users. Thus, if the second user trusts a thirduser, the first user can or should be expected to also trust the thirduser. Therefore, the transitive closure can reveal such chains of trustamong the users.

FIG. 3 is a flow chart of an example process for computing thetransitive closure of a relation. In general, the system williteratively add tuples to a relation F_(n) for each iteration n. Theprocess will be described as being performed by an appropriatelyprogrammed system of one or more computers.

The system receives a request to compute the transitive closure of arelation (310). As described above, computing the transitive closure ofa relation f generates a new relation F. The system can receive arequest from a user who specifies the relation f. The system can alsoreceive a request to automatically generate transitive closures ofrelations of source code elements as part of a static analysis process.

The system adds the tuples in f to both F₁ and ψ₁ (320). Adding thetuples in f to F₁ initializes the transitive closure relation.

While computing the transitive closure, the system will also make use ofan auxiliary relation ψ_(n+1) on each iteration. In general, therelation ψ_(n+1) includes tuples representing shortest paths having2^(n) steps on each iteration n. Thus, the length of the pathsrepresented by tuples in ψ_(n+1) doubles on each iteration. So tocalculate ψ₁, the system chooses the shortest paths having 2¹⁻¹=1 step,and simply adds the tuples of f to ψ₁.

The system generates new first tuples in ψ_(n+1) by matching destinationelements of tuples in ψ_(n) with source elements of tuples in ψ_(n), andexcluding any tuples in F_(n) (330). In other words, the systemdetermines which tuples in ψ_(n) have a destination element that matchesa source element of another tuple in ψ_(n).

The new first tuples for iteration n will be tuples representing 2^(n)chained uses of f. If the tuples represent edges in a graph,conceptually this step can be thought of as extending paths in the graphrepresented by tuples in ψ_(n) with paths also represented by the tuplesin ψ_(n). Because the tuples in ψ_(n) represent paths having length2^(n−1), the new first tuples in ψ_(n+1) will represent a doubling ofthose path lengths, resulting in paths having a length 2^(n).

The system determines whether new first tuples have been produced onthis iteration (340). If not, the system provides an indication thatF_(n) is the transitive closure off (branch to 350).

If new first tuples have been produced, the system generates new secondtuples by matching destination elements of the new first tuples inψ_(n+1) with source elements of the tuples in F_(n) (branch to 360). Inother words, the system uses the newly generated first tuples of ψ_(n+1)to generate new second tuples.

To do so, the system determines which of the new first tuples in ψ_(n+1)have a destination element that matches a source element of a tuple inF. Note that F_(n) has not been updated to include the new first tuples.

If the tuples represent edges in a graph, this step can be thought of asextending paths in the graph represented by the new first tuples inψ_(n+1) with paths represented by the tuples in F_(n). Doing sogenerates new second tuples that represent paths having a length greaterthan 2^(n) steps and less than 2¹¹′¹ steps.

The system adds the tuples in F_(n), the new first tuples, and the newsecond tuples to F_(n+1). (370). In other words, the system defines anew relation F_(n+1). In practice, the system need not generate anentirely new relation. Rather, the system can merely add the new firsttuples and the new second tuples to the relation of F_(n) and designatethe resulting relation as F_(n+1).

In the graph example, the sets of tuples that make up F_(n+1)respectively represent (1) paths having length less than 2^(n) forF_(n), (2) paths having length of exactly 2^(n) for ψ_(n+1), and (3)paths having length greater than 2^(n) but less than 2^(n+1) for thetuples generated by extending ψ_(n+1) by the tuples in F. Combinedtogether, these three sets represent all tuples of all elementsseparated by shortest paths of length less than 2^(n+1).

This approach reduces or eliminates duplicate tuples for the followingreasons. The relation for ψ_(n+1) has, on iteration n, tuplesrepresenting shortest paths of length 2^(n). Therefore, a tuple (a,c) inψ_(n+1) represents a shortest path between a tuple (a,b) in F_(n) and atuple (b,c) in F_(n). The tuple (a,c) necessarily represents theshortest path between (a,b) and (b,c) because if a shorter path existed,it would already be in F_(n), and therefore would not be in ψ_(n+1),since elements of Fn are explicitly excluded from ψ_(n+1).

Then, the system extends the paths in ψ_(n+1) with all shortest pathshaving length less than 2^(n), resulting in all shortest paths having alength between 2^(n) and 2^(n+1). Notably, the system does not need toexplore or extend any paths having a length less than 2^(n) because allof those paths were previously explored on a previous iteration. Inother words, once a tuple (s,d) is added to F_(n), the path thatgenerated (s,d) is not explored again.

Therefore, the system only generates tuples representing shortest paths,which eliminates many if not most duplicate generation when computingthe transitive closure. In this example, it results in eliminatingduplicates.

In general, duplicates are only generated due to branching or loops inthe tuples of f. In FIG. 1, for example, (v1,v5) would be generatedtwice due to the path through v2, v3, and v4 and the path through v6 andv7. And (v1,v6) would be generated twice due to (v1,v6) in f and thepath (v1,v6), (v6,v7), (v7,v6).

The proportion of duplicates that are still generated depends on thetype of data represented by the relation. For example, computer softwaretends to have long sequences of sequentially evaluated expressions andstatements, with relatively few branches and loops. Therefore, where thedata represents the control flow of a computer program, as is the casefor static analysis, almost all of the duplication will be eliminated.

To efficiently calculate the transitive closure of f, an evaluationengine can perform the example process by evaluating the followingpredicates for n=1, 2, etc., until ψ_(n+1) is empty, at which pointF_(n) will be the transitive closure off:

-   -   ψ₁(s,d):—f(s,d)    -   F₁(s,d):—f(s,d)    -   ψ_(n+1)(s,d):—exists(a: ψ_(n)(s,a), ψ_(n)(a,d)), not F_(n)(s,d)    -   F_(n+1)(s,d):—F_(n)(s,d);        -   ψ_(n+1)(s,d);        -   exists(a: ψ_(n+1)(s,a), F_(n)(a,d))

Computing the ψ_(n+1)(s,d) predicate represents computing the new firsttuples on each iteration. The existential term “exists(a: ψ_(n+1)(s,a),F_(n)(a,d))” represents computing the new second tuples on eachiteration.

Evaluation of the transitive closure using these predicates isillustrated below in TABLE 4. This example uses the example relation fthat represented edges in the graph shown in FIG. 2. Notably, usingthese predicates result in the transitive closure being generated inlog₂(lsp) time with no duplicate tuples being generated at all.

TABLE 4 Predicate Relation Comments ψ₁(s,d) {(v1, v2), ψ₁(s,d) evaluatesto: f(s,d) (v2, v3), (v3, v4), (v4, v5), (v5, v6), (v6, v7), (v7, v8)}F₁(s,d) {(v1, v2), F₁(s,d) evaluates to: f(s,d) (v2, v3), (v3, v4), (v4,v5), (v5, v6), (v6, v7), (v7, v8)} ψ₂(s,d) {(v1, v3), ψ₂(s,d) evaluatesto: (v2, v4), exists(a: ψ₁(s,a), ψ₁(a,d)), not F₁(s,d) (v3, v5), Thetuples in ψ₂ are tuples representing paths in ψ₁ (v4, v6), extended bypaths in ψ₁. The tuples in ψ₂ represent shortest (v5, v7), paths thatare 2¹ steps long, which in this case (v6, v8)} is 2 steps in the graph.F₂(s,d) {(v1, v2), F₂(s,d) evaluates to: (v2, v3), F₁(s,d); ψ₂(s,d);exists(a: ψ₂(s,a), F₁(a,d)) (v3, v4), The tuples produced are those inF₁, those in ψ₂, or (v4, v5), those in ψ₂ that can be extended by atuple in F₁. (v5, v6), We have: (v6, v7), F1: (v7, v8), {(v1, v2), (v2,v3), (v3, v4), (v4, v5), (v5, v6), (v6, (v1, v3), v7), (v7, v8)}; (v2,v4), ψ₂: (v3, v5), (v1, v3), (v2, v4), (v3, v5), (v4, v6), (v5, v7),(v6, v8); (v4, v6), For the last term, because F₁ includes only thetuples in (v5, v7), f, ψ₂ is only going to be extended by single stepsin the graph. (v6, v8), Since ψ₂ included tuples representing two steps(v1, v4), in the graph, the last term will include tuples (v2, v5),representing three steps in the graph, or: (v3, v6), {(v1, v4), (v2,v5), (v3, v6), (v4, v7), (v5, v8)} (v4, v7), In other words, F₁ includestuples representing one (v5, v8)} step in the graph, ψ₂ includes tuplesrepresenting two steps in the graph, and the last term generates tuplesrepresenting three steps in the graph. Thus, F₂ will now have tuplesrepresenting one, two, and three steps in the graph. ψ₃(s,d) {(v1, v5),ψ₃(s,d) evaluates to: (v2, v6), exists(a: ψ₂(s,a), ψ₂(a,d)), not F₂(s,d)(v3, v7), The tuples in ψ₃ are tuples representing paths in ψ₂ (v4, v6)}extended by paths in ψ₂. The tuples in ψ₃ represent paths that are 2²long, which in this case is 4 steps in the graph. F₃(s,d) {(v1, v2),F₃(s,d) evaluates to: (v2, v3), F₂(s,d); ψ₃(s,d); exists(a: ψ₃(s,a),F₂(a,d)) (v3, v4), The tuples produced are those in F₂, those in ψ₃, or(v4, v5), those in ψ₃ that can be extended by a tuple in F₂. (v5, v6),As described above, F₂ included all tuples representing (v6, v7), one,two, and three steps in the graph, and ψ₃ included (v7, v8), tuples thatrepresent four steps in the graph. (v1, v3), Thus, the four-step tuplesin ψ₃ are going to be (v2, v4), extended by all other tuples thatrepresent one, two, (v3, v5), and three steps in the graph, which willresult in all (v4, v6), tuples that include five, six, and seven stepsin the (v5, v7), graph: (v6, v8), {(v1, v6), (v1, v7), (v1, v8), (v2,v7), (v2, v8), (v3, v8)} (v1, v4), (v2, v5), (v3, v6), (v4, v7), (v5,v8), (v1, v5), (v2, v6), (v3, v7), (v4, v6), (v1, v6), (v1, v7), (v1,v8), (v2, v7), (v2, v8), (v3, v8)} ψ₄(s,d) { } ψ₄(s,d) evaluates to:exists(a: ψ₃(s,a), ψ₃(a,d)), not F₃(s,d) The tuples in ψ₄ are tuplesrepresenting paths in ψ₃ extended by paths in ψ₃, or tuples representingpaths having length 2³ = 8 steps. However, since there are no such pathsof 8 steps in the graph, ψ₄(s,d) is empty.

After the system determines that ψ₄(s,d) is empty, the system can endprocessing. This is because if ψ₄(s,d) is empty, there are no new firsttuples to be extended and thus, no second tuples either. Therefore, nonew tuples are generated on this iteration.

Using this evaluation process, the system can compute the transitiveclosure in

log₂(lsp)

+1 iterations without generating any duplicate tuples.

The examples above assume that the tuples of the relation fare ordered.Thus, in the graph context, the relation f represents edges in adirected graph. If, however, the relation f has unordered tuples, e.g.,for an undirected graph, the system can use the same procedure describedabove on a new relation g(s,d) that is the disjunction of f(s,d) andf(d,s). In other words, the relation g includes all tuples in f, aswell, as new tuples generated by reversing the source and destinationelements of the tuples in f.

Embodiments of the subject matter and the functional operationsdescribed in this specification can be implemented in digital electroniccircuitry, in tangibly-embodied computer software or firmware, incomputer hardware, including the structures disclosed in thisspecification and their structural equivalents, or in combinations ofone or more of them. Embodiments of the subject matter described in thisspecification can be implemented as one or more computer programs, i.e.,one or more modules of computer program instructions encoded on atangible non-transitory storage medium for execution by, or to controlthe operation of, data processing apparatus. Alternatively or inaddition, the program instructions can be encoded on anartificially-generated propagated signal, e.g., a machine-generatedelectrical, optical, or electromagnetic signal, that is generated toencode information for transmission to suitable receiver apparatus forexecution by a data processing apparatus. The computer storage mediumcan be a machine-readable storage device, a machine-readable storagesubstrate, a random or serial access memory device, or a combination ofone or more of them.

The term “data processing apparatus” refers to data processing hardwareand encompasses all kinds of apparatus, devices, and machines forprocessing data, including by way of example a programmable processor, acomputer, or multiple processors or computers. The apparatus can also beor further include special purpose logic circuitry, e.g., an FPGA (fieldprogrammable gate array) or an ASIC (application-specific integratedcircuit). The apparatus can optionally include, in addition to hardware,code that creates an execution environment for computer programs, e.g.,code that constitutes processor firmware, a protocol stack, a databasemanagement system, an operating system, or a combination of one or moreof them.

A computer program, which may also be referred to or described as aprogram, software, a software application, a module, a software module,a script, or code, can be written in any form of programming language,including compiled or interpreted languages, or declarative orprocedural languages, and it can be deployed in any form, including as astand-alone program or as a module, component, subroutine, or other unitsuitable for use in a computing environment. A computer program may, butneed not, correspond to a file in a file system. A program can be storedin a portion of a file that holds other programs or data, e.g., one ormore scripts stored in a markup language document, in a single filededicated to the program in question, or in multiple coordinated files,e.g., files that store one or more modules, sub-programs, or portions ofcode. A computer program can be deployed to be executed on one computeror on multiple computers that are located at one site or distributedacross multiple sites and interconnected by a data communicationnetwork.

The processes and logic flows described in this specification can beperformed by one or more programmable computers executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logiccircuitry, e.g., an FPGA (field programmable gate array) or an ASIC(application-specific integrated circuit).

Computers suitable for the execution of a computer program include, byway of example, can be based on general or special purposemicroprocessors or both, or any other kind of central processing unit.Generally, a central processing unit will receive instructions and datafrom a read-only memory or a random access memory or both. The essentialelements of a computer are a central processing unit for performing orexecuting instructions and one or more memory devices for storinginstructions and data. Generally, a computer will also include, or beoperatively coupled to receive data from or transfer data to, or both,one or more mass storage devices for storing data, e.g., magnetic,magneto-optical disks, or optical disks. However, a computer need nothave such devices. Moreover, a computer can be embedded in anotherdevice, e.g., a mobile telephone, a personal digital assistant (PDA), amobile audio or video player, a game console, a Global PositioningSystem (GPS) receiver, or a portable storage device, e.g., a universalserial bus (USB) flash drive, to name just a few.

Computer-readable media suitable for storing computer programinstructions and data include all forms of non-volatile memory, mediaand memory devices, including by way of example semiconductor memorydevices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks,e.g., internal hard disks or removable disks; magneto-optical disks; andCD-ROM and DVD-ROM disks. The processor and the memory can besupplemented by, or incorporated in, special purpose logic circuitry.

Control of the various systems described in this specification, orportions of them, can be implemented in a computer program product thatincludes instructions that are stored on one or more non-transitorymachine-readable storage media, and that are executable on one or moreprocessing devices. The systems described in this specification, orportions of them, can be implemented as an apparatus, method, orelectronic system that may include one or more processing devices andmemory to store executable instructions to perform the operationsdescribed in this specification.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input. In addition, a computer can interact with a user bysending documents to and receiving documents from a device that is usedby the user; for example, by sending web pages to a web browser on auser's device in response to requests received from the web browser.

In addition to the embodiments of the attached embodiments and theembodiments described above, the following embodiments are alsoinnovative:

Embodiment 1 is a method comprising:

receiving a request to compute a transitive closure F of a relation f,wherein the relation f includes tuples that each relate a source elements to a destination element d;

initializing F of an initial iteration with the tuples in f;

initializing an auxiliary relation of the initial iteration with thetuples in f;

iteratively computing new first tuples and new second tuples on eachiteration until no new first tuples are generated, including:

-   -   generating new first tuples on each iteration by matching        destination elements of tuples in the auxiliary relation of the        previous iteration with source elements of tuples in the        auxiliary relation of the previous iteration,    -   generating second tuples on each iteration by matching        destination elements of the new first tuples with source        elements of tuples in F of the previous iteration, and    -   adding the new first tuples and the new second tuples to F of        the current iteration; and

providing an indication that the tuples in F of the current iterationrepresent the transitive closure off.

Embodiment 2 is the method of embodiment 1, further comprising:

generating evaluation predicates that include:

-   -   a first predicate that when evaluated generates the new first        tuples,    -   a second predicate that when evaluated generates the new second        tuples, and    -   an F_(n+1) predicate that when evaluated generates a relation        having tuples in F_(n), tuples generated by the first predicate,        and tuples generated by the second predicate,

wherein iteratively computing new first tuples and new second tuplescomprises iteratively evaluating the evaluation predicates.

Embodiment 3 is the method of embodiment 2, wherein:

the first predicate is defined by:

-   -   ψ₁(s,d):—f(s,d)    -   ψ_(n+1)(s,d):—exists(a: ψ_(n)(s,a), ψ_(n)(a,d)),        F_(n)(s,d),

and the second predicate is defined by:

-   -   F₁(s,d):—f(s, d)    -   F_(n+1)(s,d):—F_(n)(s,d); ψ_(n+1)(s,d); exists(a: ψ_(n+1)(s,a),        F_(n)(a,d)).

Embodiment 4 is the method of any one of embodiments 1-3, furthercomprising:

adding the new first tuples to a new auxiliary relation for a currentiteration.

Embodiment 5 is the method of any one of embodiments 1-4, wherein thenew first tuples for an iteration n includes tuples representing 2^(n)chained uses of f.

Embodiment 6 is the method of any one of embodiments 1-5, wherein thesecond tuples for an iteration n includes tuples representing between2^(n) and 2^(n) chained uses of f.

Embodiment 7 is the method of any one of embodiments 1-6, furthercomprising computing the transitive relation off while generatingduplicate tuples only due to branches and loops.

Embodiment 8 is the method of any one of embodiments 1-7, furthercomprising computing the transitive relation of f in O(log₂(lsp))iterations, wherein lsp represents a length of the longest shortest pathof tuples in F.

Embodiment 9 is a system comprising: one or more computers and one ormore storage devices storing instructions that are operable, whenexecuted by the one or more computers, to cause the one or morecomputers to perform the method of any one of embodiments 1 to 8.

Embodiment 10 is a computer storage medium encoded with a computerprogram, the program comprising instructions that are operable, whenexecuted by data processing apparatus, to cause the data processingapparatus to perform the method of any one of embodiments 1 to 8.

While this specification contains many specific implementation details,these should not be construed as limitations on the scope of anyinvention or on the scope of what may be claimed, but rather asdescriptions of features that may be specific to particular embodimentsof particular inventions. Certain features that are described in thisspecification in the context of separate embodiments can also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially claimed assuch, one or more features from a claimed combination can in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various system modulesand components in the embodiments described above should not beunderstood as requiring such separation in all embodiments, and itshould be understood that the described program components and systemscan generally be integrated together in a single software product orpackaged into multiple software products.

Particular embodiments of the subject matter have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims can be performed in a different orderand still achieve desirable results. As one example, the processesdepicted in the accompanying figures do not necessarily require theparticular order shown, or sequential order, to achieve desirableresults. In some cases, multitasking and parallel processing may beadvantageous.

What is claimed is:
 1. A computer-implemented method comprising:receiving a request to compute a transitive closure F of a relation f,wherein the relation f includes tuples that each relate a source elements to a destination element d; initializing F of an initial iterationwith the tuples in f; initializing an auxiliary relation of the initialiteration with the tuples in f; iteratively computing new first tuplesand new second tuples on each iteration until no new first tuples aregenerated, including: generating new first tuples on each iteration bymatching destination elements of tuples in the auxiliary relation of aprevious iteration with source elements of tuples in the auxiliaryrelation of the previous iteration, generating second tuples on eachiteration by matching destination elements of the new first tuples withsource elements of tuples in F of the previous iteration, and adding thenew first tuples and the new second tuples to F of a current iteration;and providing an indication that the tuples in F of the currentiteration represent the transitive closure off.
 2. The method of claim1, further comprising: generating evaluation predicates that include: afirst predicate that when evaluated generates the new first tuples, asecond predicate that when evaluated generates the new second tuples,and an F_(n+1) predicate that when evaluated generates a relation havingtuples in F_(n), tuples generated by the first predicate, and tuplesgenerated by the second predicate, wherein iteratively computing newfirst tuples and new second tuples comprises iteratively evaluating theevaluation predicates.
 3. The method of claim 2, wherein: the firstpredicate is defined by: ψ₁(s,d):—f(s,d) ψ_(n+1)(s,d):—exists(a:ψ_(n)(s,a), ψ_(n)(a,d)),

F_(n)(s,d), and the second predicate is defined by: F₁(s,d):—f(s, d)F_(n+1)(s,d):—F_(n)(s,d); ψ_(n+1)(s,d); exists(a: ψ_(n+1)(s,a),F_(n)(a,d)).
 4. The method of claim 1, further comprising: adding thenew first tuples to a new auxiliary relation for a current iteration. 5.The method of claim 1, wherein the new first tuples for an iteration nincludes tuples representing 2^(n) chained uses of f.
 6. The method ofclaim 1, wherein the second tuples for an iteration n includes tuplesrepresenting between 2^(n) and 2^(n+1) chained uses of f.
 7. The methodof claim 1, further comprising computing the transitive relation offwhile generating duplicate tuples only due to branches and loops.
 8. Themethod of claim 1, further comprising computing the transitive relationof f in O(log₂(lsp)) iterations, wherein lsp represents a length of alongest shortest path of tuples in F.
 9. A system comprising: one ormore computers and one or more storage devices storing instructions thatare operable, when executed by the one or more computers, to cause theone or more computers to perform operations comprising: receiving arequest to compute a transitive closure F of a relation f, wherein therelation f includes tuples that each relate a source element s to adestination element d; initializing F of an initial iteration with thetuples in f; initializing an auxiliary relation of the initial iterationwith the tuples in f; iteratively computing new first tuples and newsecond tuples on each iteration until no new first tuples are generated,including: generating new first tuples on each iteration by matchingdestination elements of tuples in the auxiliary relation of a previousiteration with source elements of tuples in the auxiliary relation ofthe previous iteration, generating second tuples on each iteration bymatching destination elements of the new first tuples with sourceelements of tuples in F of the previous iteration, and adding the newfirst tuples and the new second tuples to F of a current iteration; andproviding an indication that the tuples in F of the current iterationrepresent the transitive closure off.
 10. The system of claim 9, whereinthe operations further comprise: generating evaluation predicates thatinclude: a first predicate that when evaluated generates the new firsttuples, a second predicate that when evaluated generates the new secondtuples, and an F_(n+1) predicate that when evaluated generates arelation having tuples in F_(n), tuples generated by the firstpredicate, and tuples generated by the second predicate, whereiniteratively computing new first tuples and new second tuples comprisesiteratively evaluating the evaluation predicates.
 11. The system ofclaim 10, wherein: the first predicate is defined by: ψ₁(s,d):—f(s,d)ψ_(n+1)(s,d):—exists(a: ψ_(n)(s,a), ψ_(n)(a,d)),

F_(n)(s,d), and the second predicate is defined by: F₁(s,d):—f(s, d)F_(n+1)(s,d):—F_(n)(s,d); ψ_(n+1)(s,d); exists(a: ψ_(n+1)(s,a),F_(n)(a,d)).
 12. The system of claim 9, wherein the operations furthercomprise: adding the new first tuples to a new auxiliary relation for acurrent iteration.
 13. The system of claim 9, wherein the new firsttuples for an iteration n includes tuples representing 2^(n) chaineduses of f.
 14. The system of claim 9, wherein the second tuples for aniteration n includes tuples representing between 2^(n) and 2^(n+1)chained uses of f.
 15. The system of claim 9, wherein the operationsfurther comprise computing the transitive relation off while generatingduplicate tuples only due to branches and loops.
 16. The system of claim9, wherein the operations further comprise computing the transitiverelation of f in O(log₂(lsp)) iterations, wherein lsp represents alength of a longest shortest path of tuples in F.
 17. A computer programproduct, encoded on one or more non-transitory computer storage media,comprising instructions that when executed by one or more computerscause the one or more computers to perform operations comprising:receiving a request to compute a transitive closure F of a relation f,wherein the relation f includes tuples that each relate a source elements to a destination element d; initializing F of an initial iterationwith the tuples in f; initializing an auxiliary relation of the initialiteration with the tuples in f; iteratively computing new first tuplesand new second tuples on each iteration until no new first tuples aregenerated, including: generating new first tuples on each iteration bymatching destination elements of tuples in the auxiliary relation of aprevious iteration with source elements of tuples in the auxiliaryrelation of the previous iteration, generating second tuples on eachiteration by matching destination elements of the new first tuples withsource elements of tuples in F of the previous iteration, and adding thenew first tuples and the new second tuples to F of a current iteration;and providing an indication that the tuples in F of the currentiteration represent the transitive closure off.
 18. The computer programproduct of claim 17, wherein the operations further comprise: generatingevaluation predicates that include: a first predicate that whenevaluated generates the new first tuples, a second predicate that whenevaluated generates the new second tuples, and an F_(n+1) predicate thatwhen evaluated generates a relation having tuples in F_(n), tuplesgenerated by the first predicate, and tuples generated by the secondpredicate, wherein iteratively computing new first tuples and new secondtuples comprises iteratively evaluating the evaluation predicates. 19.The computer program product of claim 18, wherein: the first predicateis defined by: ψ₁(s,d):—f(s,d) ψ_(n+1)(s,d):—exists(a: ψ_(n)(s,a), ψ_(n)(a,d)),

F_(n)(s,d), and the second predicate is defined by: F₁(s,d):—f(s, d)F_(n+1)(s,d):—F_(n)(s,d); ψ_(n+1)(s,d); exists(a: ψ_(n+1)(s,a),F_(n)(a,d)).
 20. The computer program product of claim 17, wherein theoperations further comprise: adding the new first tuples to a newauxiliary relation for a current iteration.
 21. The computer programproduct of claim 17, wherein the new first tuples for an iteration nincludes tuples representing 2^(n) chained uses of f.
 22. The computerprogram product of claim 17, wherein the second tuples for an iterationn includes tuples representing between 2^(n) and 2^(n+1) chained uses off.
 23. The computer program product of claim 17, wherein the operationsfurther comprise computing the transitive relation off while generatingduplicate tuples only due to branches and loops.
 24. The computerprogram product of claim 17, wherein the operations further comprisecomputing the transitive relation of f in O(log₂(lsp)) iterations,wherein lsp represents a length of a longest shortest path of tuples inF.